Accurate whole-body multi-person pose estimation and tracking is an important yet challenging topic in computer vision. To capture the subtle actions of humans for complex behavior analysis, whole-body pose estimation including the face, body, hand and foot is essential over conventional body-only pose estimation. In this paper, we present AlphaPose, a system that can perform accurate whole-body pose estimation and tracking jointly while running in realtime. To this end, we propose several new techniques: Symmetric Integral Keypoint Regression (SIKR) for fast and fine localization, Parametric Pose Non-Maximum-Suppression (P-NMS) for eliminating redundant human detections and Pose Aware Identity Embedding for jointly pose estimation and tracking. During training, we resort to Part-Guided Proposal Generator (PGPG) and multi-domain knowledge distillation to further improve the accuracy. Our method is able to localize whole-body keypoints accurately and tracks humans simultaneously given inaccurate bounding boxes and redundant detections. We show a significant improvement over current state-of-the-art methods in both speed and accuracy on COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose estimation dataset. Our model, source codes and dataset are made publicly available at https://github.com/MVIG-SJTU/AlphaPose.
translated by 谷歌翻译
在深度学习中,变压器一直是必不可少的主食。但是,对于现实生活中的应用程序,由于模型的巨大参数和操作,部署有效的变压器非常具有挑战性。为了减轻这种负担,利用稀疏是加速变压器的有效方法。新出现的Ampere GPU利用2:4的稀疏模式来实现模型加速度,而在部署模型时,它几乎无法满足各种算法和硬件约束。相比之下,我们提出了一个算法 - 铁软件合作的框架,以灵活有效地加速变压器,通过使用一般的N:M稀疏模式。 (1)从算法的角度来看,我们提出了一种稀疏性遗传机制以及一种遗传的动态修剪(IDP)方法,以迅速获得一系列N:M稀疏候选变压器。进一步提出了模型压缩方案,以显着减少部署的存储需求。 (2)从硬件的角度来看,我们提出了一种灵活,有效的硬件体系结构,即STA,以在部署N:M稀疏变压器时达到显着加速。 STA不仅具有具有较高计算效率的稀疏密度和致密矩阵乘法的计算引擎,而且还具有可扩展的软模块,从而消除了中级外芯片外数据通信的延迟。实验结果表明,与其他使用IDP生成的其他方法相比,n:m稀疏变压器的准确性平均提高了6.7%。此外,与Intel I9-9900X和NVIDIA RTX 2080 TI相比,STA可以达到14.47倍和11.33倍的速度,并且比最先进的基于FPGA的加速器对变形金刚的最先进的推断速度可以快2.00-19.47倍。
translated by 谷歌翻译
实时投标(RTB)是现代在线广告系统中的重要机制。广告商在RTB中采用投标策略来优化其广告效果,但根据各种财务要求,其中广泛采用的是投资回报(ROI)约束。在顺序招标过程中,ROI在非单调的情况下变化,通常在约束满意度和客观优化之间具有透视作用。通常在静态或轻微变化的市场中建立了约束 - 目标权衡解决方案。但是,由于无法适应不同的动态和部分可观察性,这些方法在非平稳广告市场中大大失败。在这项工作中,我们专门研究非机构市场的ROI限制招标。基于部分可观察到的马尔可夫决策过程,我们提出了第一个容纳非单调约束的硬屏障解决方案。我们的方法利用了无参数指标的奖励功能,并开发了课程指导的贝叶斯强化学习(CBRL)框架,以适应在非平稳广告市场中的约束目标权衡。在具有两个问题设置的大规模工业数据集上进行的广泛实验表明,CBRL在分布和分发数据制度方面都很好地概括了,并且具有出色的稳定性。
translated by 谷歌翻译
在肺结节的管理中,我们希望根据其在计算机断层扫描(CT)扫描的直径变化方面预测结节的演变,然后根据结节不断增长的趋势的预测结果提供后续建议。为了提高肺结节增长趋势预测的性能,与连续CT扫描中相同结节的变化进行比较至关重要。在此激励的情况下,我们从国家肺筛查试验(NLST)数据集进行了两次以上的CT扫描,筛选了4,666名受试者,以组织一个名为NLSTT的颞数据集。在具体上,我们首先检测并配对感兴趣的区域(ROI),该区域涵盖了基于注册的CT扫描的相同结节。之后,我们通过模型预测结节的纹理类别和直径大小。最后,我们根据直径的变化来注释每个结节的演化类别。基于构建的NLSTT数据集,我们建议一个暹罗编码器同时利用从连续的CT扫描中检测到的3D ROI的判别特征。然后,我们在新小时设计一个时空混合器(STM)来利用连续3D ROI中同一结节的间隔变化,并捕获结节区域的空间依赖性和当前的3D ROI。根据临床诊断常规,我们采用层次损失来更多地关注生长的结节。我们有组织的数据集上的广泛实验证明了我们提出的方法的优势。我们还对内部数据集进行了实验,以通过将其与熟练的临床医生进行比较来评估我们方法的临床实用性。
translated by 谷歌翻译
支持向量机(SVM)是一种强大的分类方法,在许多领域取得了巨大成功。由于其性能可能受到冗余协变量严重损害,因此模型选择技术广泛用于具有高维协调因子的SVM。作为模型选择的替代方案,在过去几十年的模型平均领域已经取得了重大进展。然而,对于SVM,没有考虑频繁的模型平均方法。这项工作旨在填补差距,并提出SVM的频繁模型平均程序,通过交叉验证选择最佳重量。即使当协变量的次数以相位大小的指数速率发散时,我们也显示了所提出的方法的渐近最优性,即其铰链损耗与最低可能损失的比率会聚到一个。我们还导出了融合率,为模型平均提供了更多的洞察。与SVM的模型选择方法相比,需要调整参数选择的繁琐但关键任务,模型平均方法避免了任务并在实证研究中显示了有希望的表现。
translated by 谷歌翻译
精确分割牙齿并识别牙科网格模型上的相应解剖标签在计算机辅助性正畸治疗中是必不可少的。手动执行这两个任务是耗时,繁琐的,更重要的是,由于患者牙齿的异常和大规模差异,高度依赖于矫正者的经验。一些基于机器学习的方法已经设计和应用于正畸场,以自动分割牙科网格(例如,口腔扫描)。相比之下,牙齿地标定位的研究数量仍然有限。本文提出了一种基于网格深度学习(称为TS-MDL)的两级框架,用于联合牙齿标签和原始内部扫描的地标识别。我们的TS-MDL首先采用端到端\ EMPH {i} MeshsegNet方法(即,现有网格孔的变体,具有改进的精度和效率),以在下采样扫描上标记每个牙齿。由分割输出引导,我们的TS-MDL进一步选择原始网格上的每个牙齿的感兴趣区域(ROI),以构造开头的光重变量(即PINTNET-REG),用于回归相应的地标热插块。我们的TS-MDL在实际的数据集上进行了评估,显示了有希望的细分和本地化性能。具体而言,TS-MDL的第一阶段中的\ EMPH {i} Meshsegnet达到了0.964 \ PM0.054 $ 0.964 \ PM0.054 $的平均骰子相似度系数(DSC),显着优于原始的Meshsegnet。在第二阶段,PointNet-Reg实现了0.597 \ PM0.761 \,预测和地面真理之间的平均绝对误差(MAE),以66美元的地标,与地标检测的其他网络相比,比较优越。所有这些结果表明我们在临床实践中的TS-MDL潜在使用。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译